# EEGUnity Kernel Tutorial: Rich Metadata, `misc`, `stim`, and Annotations

This tutorial explains how to use EEGUnity kernels to inject dataset-specific metadata and channels in memory.

## 1. Design Principle

EEGUnity keeps **locator metadata as source of truth**:

- `format_channel_names()` standardizes locator channels as `channel_type:channel_name`.
- `get_data_row()` uses locator metadata to overwrite raw metadata at load time.
- Kernels are applied **after** locator-driven metadata patching.

This allows online metadata maintenance without modifying source files.

## 2. What a Kernel Can Do

A kernel can:

- add or update `raw.info["description"]`
- add or adjust multiple `misc` channels
- add or adjust multiple `stim` channels
- add/update annotations

Kernel interface:

```python
class SomeKernel:
    def apply(self, udataset, raw, row):
        ...
        return raw

KERNEL = SomeKernel()
```

## 3. Annotation vs `misc` vs `stim`

Use these three mechanisms for different semantics:

- `Annotations`: text labels mapped to time segments (`onset`, `duration`, `description`).
- `misc` channels: continuous values over time (for example probability density, reaction-time trajectory).
- `stim` channels: integer event codes over time (for example class sequence 1/2/3).

For a single scalar value for one segment, fill the covered segment in a `misc` channel.

## 4. Example Kernel with Multiple `misc` and `stim` Channels

```python
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import mne


def add_channel(raw: mne.io.BaseRaw, ch_name: str, ch_type: str, values: np.ndarray) -> mne.io.BaseRaw:
    """Append one channel to raw with explicit MNE channel type."""
    if values.ndim != 1:
        raise ValueError("values must be a 1D array")
    if values.shape[0] != raw.n_times:
        raise ValueError("values length must equal raw.n_times")

    info = mne.create_info([ch_name], sfreq=raw.info["sfreq"], ch_types=[ch_type])
    ch_raw = mne.io.RawArray(values[np.newaxis, :], info, verbose=False)
    raw.add_channels([ch_raw], force_update_info=True)
    return raw


@dataclass
class ExampleKernel:
    KERNEL_ID: str = "example_rich_meta"

    def apply(self, udataset, raw: mne.io.BaseRaw, row):
        n = raw.n_times

        # misc channels (continuous signals)
        prob_density = np.linspace(0.1, 0.9, n, dtype=float)
        reaction_time = np.full(n, 0.42, dtype=float)
        raw = add_channel(raw, "prob_density", "misc", prob_density)
        raw = add_channel(raw, "reaction_time", "misc", reaction_time)

        # stim channels (integer codes)
        task_code = np.zeros(n, dtype=float)
        task_code[n // 4: n // 2] = 1
        task_code[n // 2: 3 * n // 4] = 2
        task_code[3 * n // 4:] = 3

        stage_code = np.zeros(n, dtype=float)
        stage_code[n // 3: 2 * n // 3] = 7

        raw = add_channel(raw, "task_code", "stim", task_code)
        raw = add_channel(raw, "stage_code", "stim", stage_code)

        # annotation segments (text semantics)
        ann = mne.Annotations(
            onset=[0.0, raw.times[n // 2]],
            duration=[2.0, 2.0],
            description=["trial_start", "feedback"],
        )
        raw.set_annotations(ann)

        return raw


KERNEL = ExampleKernel()
```

## 5. Binding and Running

```python
from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path=r"path/to/dataset",
    domain_tag="my_dataset",
    kernel_spec=r"path/to/example_kernel.py",
)

# Parser path
raw0 = ud.eeg_parser.get_data(0)

# Batch path (kernel is also applied when loading row data in batch methods)
ud.eeg_batch.get_file_hashes(data_stream=True)
```

## 6. Channel Type Compatibility

EEGUnity standard prefixes are lowercase MNE-style (`eeg`, `eog`, `emg`, `ecg`, `meg`, `stim`, `misc`, `bio`) and it also accepts explicit MNE channel type strings in locator entries, for example:

- `seeg:LA1`
- `ecog:G1`
- `dbs:DBS1`
- `fnirs_od:S1_D1_760`
- `pupil:pupil_left`
- `misc:prob_density`
- `stim:task_code`

Legacy uppercase prefixes (`EEG`, `EOG`, `EMG`, `ECG`, `STIM`, `Unknown`) are accepted for backward compatibility.

## 7. Recommended Practice

- Use annotations for semantic event intervals.
- Use `stim` for integer-coded sequences.
- Use `misc` for continuous labels.
- Keep kernel logic dataset-specific and deterministic.